Identification of Contaminants in Proteomics Mass Spectrometry Data
نویسندگان
چکیده
Mass spectrometry (MS) is a widely used method for protein identification. Peptide mass fingerprinting is the protein identification technique in which MS is employed to determine the masses of peptide fragments generated following enzymatic digestion of proteins. The masses of peptides are then submitted to a recognition program, e.g., MASCOT or MSFIT, for identification of a protein. The strategy is hampered, however, because not only are the peptide masses determined, but also the masses of multiple contaminants that are also present in the sample. Although the masses of some common and known contaminants are removed (e.g., peptides generated by trypsin autolysis), many others are inadvertently incorporated into the analysis. In this paper we present an approach for automatic identification of contaminant masses so that they can be removed prior to the identification process. For this purpose we have developed an algorithm that clusters mass values. We calculate the frequencies of all masses and then identify contaminants. We propose that masses with frequency higher than a given value are contaminants. In our analysis of 3,029 digested proteins, yielding 78,384 masses, we identified 16 possible contaminants. Of these 16, four are known trypsin autolysis peptides. Removing these contaminant masses from the database search will lead to more accurate and reliable protein identification.
منابع مشابه
MaConDa: a publicly accessible mass spectrometry contaminants database
UNLABELLED Mass spectrometry is widely used in bioanalysis, including the fields of metabolomics and proteomics, to simultaneously measure large numbers of molecules in complex biological samples. Contaminants routinely occur within these samples, for example, originating from the solvents or plasticware. Identification of these contaminants is crucial to enable their removal before data analys...
متن کاملProteomic Analysis of Gene Expression in Basal Cell Carcinoma
Background: Basal Cell Carcinoma (BCC) is a type of non-melanoma skin cancer. Alteration in gene expression is the important event that happens in cancer cell. Detection of this event is possible by proteomics techniques. Methods: Normal and tumor tissues were taken from BCC patient. Total proteins were purified by standard methods, and proteins were separated by two-dimensional electrophoresis...
متن کاملProteome analysis of Cryptosporidium parvum and C. hominis using two-dimentional electrophoresis, image analysis and tandem mass spectrometry
Until recently, Cryptosporidium was thought to be a single species genus. Molecular studies now showthat there are at least 10 valid species of this parasite. Among them, two morphologically identical species, C.hominis and C. parvum are the most pathogenic identified to date and share 97% of identical genomes.Post-genomic analyses is therefore necessary to explore further the...
متن کاملCOMPASS: a suite of pre- and post-search proteomics software tools for OMSSA.
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labelin...
متن کاملPeptide identification in whole-sample mass spectrometry proteomics.
Peptide identification for whole-sample mass spectrometry (MS) proteomics is in its infancy. While sophisticated tandem MS/MS instrumentation exists for accurate peptide identification after sample separation, there are few options for those who produce data from intact protein samples. We present a novel algorithm which uses available information from the literature and online protein database...
متن کامل